Improving Retrievability and Recall by Automatic Corpus Partitioning
نویسندگان
چکیده
With increasing volumes of data, much effort has been devoted to finding the most suitable answer to an information need. However, in many domains, the question whether any specific information item can be found at all via a reasonable set of queries is essential. This concept of Retrievability of information has evolved into an important evaluation measure of IR systems in recall-oriented application domains. While several studies evaluated retrieval bias in systems, solid validation of the impact of retrieval bias and the development of methods to counter low retrievability of certain document types would be desirable. This paper provides an in-depth study of retrievability characteristics over queries of different length in a large benchmark corpus, validating previous studies. It analyzes the possibility of automatically categorizing documents into low and high retrievable documents based on document properties rather than complex retrievability analysis. We furthermore show, that this classification can be used to improve overall retrievability of documents by treating these classes as separate document corpora, combining individual retrieval results. Experiments are validated on 1.2 million patents of the TREC Chemical Retrieval Track.
منابع مشابه
Analyzing Document Retrievability in Patent Retrieval Settings
Most information retrieval settings, such as web search, are typically precision-oriented, i.e. they focus on retrieving a small number of highly relevant documents. However, in specific domains, such as patent retrieval or law, recall becomes more relevant than precision: in these cases the goal is to find all relevant documents, requiring algorithms to be tuned more towards recall at the cost...
متن کاملEvaluating bias in retrieval systems for recall oriented documents retrieval
The evaluation of a retrieval system has always been the focus of research. Most of the retrieval systems seem to be more efficient for precision oriented documents than recall oriented documents since there is a difference between both the recall and precision oriented documents. Therefore, a system that is efficient for the retrieval of precision oriented documents does not need to be good fo...
متن کاملOn the relationship between query characteristics and IR functions retrieval bias
Bias quantification of retrieval functions with the help of document retrievability scores has recently evolved as an important evaluation measure for recall-oriented retrieval applications.While numerous studies have evaluated retrieval bias of retrieval functions, solid validation of its impact on realistic types of queries is still limited. This is due to the lack of well-accepted criteria f...
متن کاملEvaluation of the Current DOE Document Conversion System : A Study of Retrievability
(UNLV) has been tasked to suggest improvements and evaluate the performance of the current DOE document conversion system. 1 This report gives a summary of the recommendations made by ISRI staff and a summary of the results of two types of performance tests. There are two approaches to evaluating the performance of document conversion systems. One approach is to measure the accuracy of the text...
متن کاملRetrieval Models versus Retrievability
Retrievability is an important measure in information retrieval that can be used to analyze retrieval models and document collections. Rather than just focusing on a set of few documents that are given in the form of relevance judgments, retrievability examines what is retrieved, how frequently it is retrieved, and how much effort is needed to retrieve it. Such a measure is of particular intere...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Trans. Large-Scale Data- and Knowledge-Centered Systems
دوره 2 شماره
صفحات -
تاریخ انتشار 2010